Search CORE

17 research outputs found

Mixture of Expert/Imitator Networks: Scalable Semi-supervised Learning Framework

Author: Inui Kentaro
Kiyono Shun
Suzuki Jun
Publication venue
Publication date: 19/11/2018
Field of study

The current success of deep neural networks (DNNs) in an increasingly broad range of tasks involving artificial intelligence strongly depends on the quality and quantity of labeled training data. In general, the scarcity of labeled data, which is often observed in many natural language processing tasks, is one of the most important issues to be addressed. Semi-supervised learning (SSL) is a promising approach to overcoming this issue by incorporating a large amount of unlabeled data. In this paper, we propose a novel scalable method of SSL for text classification tasks. The unique property of our method, Mixture of Expert/Imitator Networks, is that imitator networks learn to "imitate" the estimated label distribution of the expert network over the unlabeled data, which potentially contributes a set of features for the classification. Our experiments demonstrate that the proposed method consistently improves the performance of several types of baseline DNNs. We also demonstrate that our method has the more data, better performance property with promising scalability to the amount of unlabeled data.Comment: Accepted by AAAI 201

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Lessons on Parameter Sharing across Layers in Transformers

Author: Kiyono Shun
Takase Sho
Publication venue
Publication date: 13/04/2021
Field of study

We propose a parameter sharing method for Transformers (Vaswani et al., 2017). The proposed approach relaxes a widely used technique, which shares parameters for one layer with all layers such as Universal Transformers (Dehghani et al., 2019), to increase the efficiency in the computational time. We propose three strategies: Sequence, Cycle, and Cycle (rev) to assign parameters to each layer. Experimental results show that the proposed strategies are efficient in the parameter size and computational time. Moreover, we indicate that the proposed strategies are also effective in the configuration where we use many training data such as the recent WMT competition

arXiv.org e-Print Archive

ニューラルネットワークを用いた自然言語処理のための大規模半教師あり学習

Author: Kiyono Shun
Publication venue
Publication date: 25/03/2022
Field of study

Tohoku University乾健太郎課

Tohoku University Repository (TOUR) / 東北大学機関リポジトリ